42 research outputs found
Engineering of increased L-Threonine production in bacteria by combinatorial cloning and machine learning
The goal of this study is to develop a general strategy for bacterial engineering using an integrated synthetic biology and machine learning (ML) approach. This strategy was developed in the context of increasing L-threonine production in Escherichia coli ATCC 21277. A set of 16 genes was initially selected based on metabolic pathway relevance to threonine biosynthesis and used for combinatorial cloning to construct a set of 385 strains to generate training data (i.e., a range of L-threonine titers linked to each of the specific gene combinations). Hybrid (regression/classification) deep learning (DL) models were developed and used to predict additional gene combinations in subsequent rounds of combinatorial cloning for increased L-threonine production based on the training data. As a result, E. coli strains built after just three rounds of iterative combinatorial cloning and model prediction generated higher L-threonine titers (from 2.7 g/L to 8.4 g/L) than those of patented L-threonine strains being used as controls (4–5 g/L). Interesting combinations of genes in L-threonine production included deletions of the tdh, metL, dapA, and dhaM genes as well as overexpression of the pntAB, ppc, and aspC genes. Mechanistic analysis of the metabolic system constraints for the best performing constructs offers ways to improve the models by adjusting weights for specific gene combinations. Graph theory analysis of pairwise gene modifications and corresponding levels of L-threonine production also suggests additional rules that can be incorporated into future ML models
Structure of a cupin protein Plu4264 from Photorhabdus luminescens subsp. laumondii TTO1 at 1.35 Å resolution
Proteins belonging to the cupin superfamily have a wide range of catalytic and noncatalytic functions. Cupin proteins commonly have the capacity to bind a metal ion with the metal frequently determining the function of the protein. We have been investigating the function of homologous cupin proteins that are conserved in more than 40 species of bacteria. To gain insights into the potential function of these proteins we have solved the structure of Plu4264 from Photorhabdus luminescens TTO1 at a resolution of 1.35 Å and identified manganese as the likely natural metal ligand of the protein
Recommended from our members
Evolution of substrate specificity in a retained enzyme driven by gene loss.
The connection between gene loss and the functional adaptation of retained proteins is still poorly understood. We apply phylogenomics and metabolic modeling to detect bacterial species that are evolving by gene loss, with the finding that Actinomycetaceae genomes from human cavities are undergoing sizable reductions, including loss of L-histidine and L-tryptophan biosynthesis. We observe that the dual-substrate phosphoribosyl isomerase A or priA gene, at which these pathways converge, appears to coevolve with the occurrence of trp and his genes. Characterization of a dozen PriA homologs shows that these enzymes adapt from bifunctionality in the largest genomes, to a monofunctional, yet not necessarily specialized, inefficient form in genomes undergoing reduction. These functional changes are accomplished via mutations, which result from relaxation of purifying selection, in residues structurally mapped after sequence and X-ray structural analyses. Our results show how gene loss can drive the evolution of substrate specificity from retained enzymes
Structural Characterization of CalS8, a TDP-α-D-Glucose Dehydrogenase Involved in Calicheamicin Aminodideoxypentose Biosynthesis
Classical UDP-glucose 6-dehydrogenases (UGDHs; EC 1.1.1.22) catalyze the conversion of UDP-α-d-glucose (UDP-Glc) to the key metabolic precursor UDP-α-d-glucuronic acid (UDP-GlcA) and display specificity for UDP-Glc. The fundamental biochemical and structural study of the UGDH homolog CalS8 encoded by the calicheamicin biosynthetic gene is reported and represents one of the first studies of a UGDH homolog involved in secondary metabolism. The corresponding biochemical characterization of CalS8 reveals CalS8 as one of the first characterized base-permissive UGDH homologs with a \u3e15-fold preference for TDP-Glc over UDP-Glc. The corresponding structure elucidations of apo-CalS8 and the CalS8·substrate·cofactor ternary complex (at 2.47 and 1.95 Å resolution, respectively) highlight a notably high degree of conservation between CalS8 and classical UGDHs where structural divergence within the intersubunit loop structure likely contributes to the CalS8 base permissivity. As such, this study begins to provide a putative blueprint for base specificity among sugar nucleotide-dependent dehydrogenases and, in conjunction with prior studies on the base specificity of the calicheamicin aminopentosyltransferase CalG4, provides growing support for the calicheamicin aminopentose pathway as a TDP-sugar-dependent process
Recommended from our members
Data Standards for the Genomes to Life Program
Existing GTL Projects already have produced volumes of dataand, over the course of the next five years, will produce an estimatedhundreds, or possibly thousands, of terabytes of data from hundreds ofexperiments conducted at dozens of laboratories in National Labs anduniversities across the nation. These data will be the basis forpublications by individual researchers, research groups, andmulti-institutional collaborations, and the basis for future DOEdecisions on funding further research in bioremediation. The short-termand long-term value of the data to project participants, to the DOE, andto the nation depends, however, on being able to access the data and onhow, or whether, the data are archived. The ability to access data is thestarting point for data analysis and interpretation, data integration,data mining, and development of data-driven models. Limited orinefficient data access means that less data are analyzed in acost-effective and timely manner. Data production in the GTL Program willlikely outstrip, or may have already outstripped, the ability to analyzethe data. Being able to access data depends on two key factors: datastandards and implementation of the data standards. For the purpose ofthis proposal, a data standard is defined as a standard, documented wayin which data and information about the data are describe. The attributesof the experiment in which the data were collected need to be known andthe measurements corresponding to the data collected need to bedescribed. In general terms, a data standard could be a form (electronicor paper) that is completed by a researcher or a document that prescribeshow a protocol or experiment should be described in writing.Datastandards are critical to data access because they provide a frameworkfor organizing and managing data. Researchers spend significant amountsof time managing data and information about experiments using labnotebooks, computer files, Excel spreadsheets, etc. In addition, dataoutput format varies for different equipment and usually need to beformatted differently for the variety of computer programs used todisplay and analyze the data. If, however, data for a given type ofexperiment were converted from vendor format to a format defined by adata standard, then researchers and software developers could save time.In addition, if data and information describing how they were obtainedwere available in a consistent format throughout the GTL Program,comparison and integration of results would be facilitated and a datarepository could be built to encourage project-wide data mining.Datastandards also are essential for archiving data sets. If data are storedtogether with the experiment metadata (i.e., information about the data)in an 'information/data package', then the data retain their value due tothe accessibility of information about measurement and analysisprocedures.DOE's commitment to developing data standards for the GTLProgram is needed to ensure that the most value is obtained from DOE'sexpenditures on experimental work and to provide a data repository thatcan be used as the basis for on-going model development. By developingdata standards for experiments conducted as part of the GTL Program, DOEhas the opportunity to facilitate data sharing not only within the DOEcommunity, but also with research institutes through theworld
Crystal Structure of the Zorbamycin-Binding Protein ZbmA, the Primary Self-Resistance Element in Streptomyces flavoviridis ATCC21892
The bleomycins (BLMs), tallysomycins (TLMs), phleomycin, and zorbamycin (ZBM) are members of the BLM family of glycopeptide-derived antitumor antibiotics. The BLM-producing Streptomyces verticillus ATCC15003 and the TLM-producing Streptoalloteichus hindustanus E465-94 ATCC31158 both possess at least two self-resistance elements, an N-acetyltransferase and a binding protein. The N-acetyltransferase provides resistance by disrupting the metal-binding domain of the antibiotic that is required for activity, while the binding protein confers resistance by sequestering the metal-bound antibiotic and preventing drug activation via molecular oxygen. We recently established that the ZBM producer, Streptomyces flavoviridis ATCC21892, lacks the N-acetyltransferase resistance gene and that the ZBM-binding protein, ZbmA, is sufficient to confer resistance in the producing strain. To investigate the resistance mechanism attributed to ZbmA, we determined the crystal structures of apo and Cu(II)-ZBM-bound ZbmA at high resolutions of 1.90 and 1.65 Å, respectively. A comparison and contrast with other structurally characterized members of the BLM-binding protein family revealed key differences in the protein–ligand binding environment that fine-tunes the ability of ZbmA to sequester metal-bound ZBM and supports drug sequestration as the primary resistance mechanism in the producing organisms of the BLM family of antitumor antibiotics
Structural Dynamics of a Methionine Îł-lyase for Calicheamicin Biosynthesis: Rotation of the Conserved Tyrosine Stacking with Pyridoxal Phosphate
CalE6 from Micromonospora echinospora is a (pyridoxal 5′ phosphate) PLP-dependent methionine γ-lyase involved in the biosynthesis of calicheamicins. We report the crystal structure of a CalE6 2-(N-morpholino)ethanesulfonic acidcomplex showing ligand-induced rotation of Tyr100, which stacks with PLP, resembling the corresponding tyrosine rotation of true catalytic intermediates of CalE6 homologs. Elastic network modeling and crystallographic ensemble refinement reveal mobility of the N-terminal loop, which involves both tetrameric assembly and PLP binding. Modeling and comparative structuralanalysis of PLP-dependent enzymes involved in Cys/Met metabolism shine light on the functional implications of the intrinsic dynamic properties of CalE6 in catalysis and holoenzyme maturation
Fixed-target serial crystallography at the Structural Biology Center
Serial synchrotron crystallography enables the study of protein structures under physiological temperature and reduced radiation damage by collection of data from thousands of crystals. The Structural Biology Center at Sector 19 of the Advanced Photon Source has implemented a fixed-target approach with a new 3D-printed mesh-holder optimized for sample handling. The holder immobilizes a crystal suspension or droplet emulsion on a nylon mesh, trapping and sealing a near-monolayer of crystals in its mother liquor between two thin Mylar films. Data can be rapidly collected in scan mode and analyzed in near real-time using piezoelectric linear stages assembled in an XYZ arrangement, controlled with a graphical user interface and analyzed using a high-performance computing pipeline. Here, the system was applied to two β-lactamases: a class D serine β-lactamase from Chitinophaga pinensis DSM 2588 and L1 metallo-β-lactamase from Stenotrophomonas maltophilia K279a
Crystal structure of SgcJ, an NTF2-like superfamily protein involved in biosynthesis of the nine-membered enediyne antitumor antibiotic C-1027
Comparative analysis of the enediyne biosynthetic gene clusters revealed sets of conserved genes serving as outstanding candidates for the enediyne core. Here we report the crystal structures of SgcJ and its homologue NCS-Orf16, together with gene inactivation and site-directed mutagenesis studies, to gain insight into enediyne core biosynthesis. Gene inactivation in vivo establishes that SgcJ is required for C-1027 production in Streptomyces globisporus. SgcJ and NCS-Orf16 share a common structure with the nuclear transport factor 2-like superfamily of proteins, featuring a putative substrate binding or catalytic active site. Site-directed mutagenesis of the conserved residues lining this site allowed us to propose that SgcJ and its homologues may play a catalytic role in transforming the linear polyene intermediate, along with other enediyne polyketide synthase-associated enzymes, into an enzyme-sequestered enediyne core intermediate. These findings will help formulate hypotheses and design experiments to ascertain the function of SgcJ and its homologues in nine-membered enediyne core biosynthesis